In [1]:
import pandas as pd
from urllib.request import urlopen
import json
import warnings
warnings.filterwarnings("ignore")
Let's make a choropleth map with Pokemon statistics. The color of a county should correspond to the number of Pokemons found there. You can download the data from Canvas (pokemon.csv). The data is a subset of the pokemon data from Kaggle.
We'll also need an SVG map. You can download it from Wikipedia.
If you open the SVG with a text editor, you'll see many <path> tags. Each of these is a county. We want to change their style tags, namely the fill color. We want the darkness of fill to correspond to the number of Pokemons in each county.
In the SVG, there is also an id tag for each path, which is actually something called a FIPS code. FIPS stands for Federal Information Processing Standard. Every county has a unique FIPS code, and it’s how we are going to associate each path with our pokemon data.
For this we first need to do some data cleaning.
In [2]:
pokemon = pd.read_csv('pokemon.csv')
pokemon.head()
Out[2]:
The data only has the latitude and longitude data. To convert this to an FIPS code, we need some reverse-geocoding. The Federal Communications Commission provides an API for such tasks.
The API works through an HTTP request, so we can use Python's urllib library to handle it. For example:
In [3]:
res = urlopen("http://data.fcc.gov/api/block/find?format=json&latitude=28.35975&longitude=-81.421988").read().decode('utf-8')
res
Out[3]:
The result comes as a json object, so we need to parse it with Python's json decoder.
In [4]:
json.loads(res)
Out[4]:
Now we can access it as a dictionary and get the county's FIPS code.
In [5]:
json.loads(res)['County']['FIPS']
Out[5]:
We can do this to all data in the dataframe. Pandas's apply is a very nice feature that you may want to use, it allows you to write a function and apply it to the dataframe.
In [6]:
# TODO: create a column in the dataframe called 'FIPS' for the FIPS codes.
# You should have the dataframe look like the following.
# Note that looking up all the lat-lon pairs may take some time.
def get_fips(row):
res = urlopen("http://data.fcc.gov/api/block/find?format=json&latitude="+str(row['latitude'])+"&longitude="+str(row['longitude'])).read().decode('utf-8')
return json.loads(res)['County']['FIPS']
pokemon['FIPS'] = pokemon.apply(get_fips, axis=1)
In [7]:
pokemon.head()
Out[7]:
We want to color the counties by the number of pokemons appearing in them, so now all we need is a table with the counties' FIPS and number of pokemons in them.
In [8]:
pokemon_density = pd.DataFrame(pokemon.groupby('FIPS').size().reset_index())
pokemon_density.columns = ['FIPS', 'Count']
In [9]:
pokemon_density.head()
Out[9]:
Now we can turn to our SVG file. We want to find the paths for each county: there are over 3000 counties, so we'll need a nice way. For this, we can use the BeautifulSoup package. This is a package specialized at parsing XMLs. SVGs are essentially XML files, so can be handled in the same way as handling HTML and other XML files.
In [10]:
from bs4 import BeautifulSoup
Read in the svg
In [11]:
svg = open('USA_Counties_with_FIPS_and_names.svg', 'r').read()
Load it with BeautifulSoup
In [12]:
soup = BeautifulSoup(svg)
BeautifulSoup has a findAll() function that finds all given tags.
In [13]:
paths = soup.findAll('path')
In [14]:
paths[0]
Out[14]:
We should also decide on the colors. colorbrew provides some nice palattes. Pick one of the sequential colors and make the hexadecimal encodings into a list.
In [15]:
colors = ['#fef0d9', '#fdd49e', '#fdbb84','#fc8d59','#e34a33','#b30000']
In [16]:
# TODO: substitute the above with a palatte of your choice.
colors = ['#f0f9e8','#bae4bc','#7bccc4','#43a2ca','#0868ac']
Now we’re going to change the style attribute for each path in the SVG. We’re just interested in fill color, but to make things easier we’re going to replace the entire style instead of parsing to replace only the color. Define the style as the following:
In [17]:
path_style = 'font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-opacity:1;\
stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;\
marker-start:none;stroke-linejoin:bevel'
In [18]:
for p in paths:
try:
cnt = int(pokemon_density[pokemon_density['FIPS'] == p['id']]['Count'])
if cnt > 20: color_class = 4
elif (cnt> 15 and cnt <= 20):color_class = 3
elif (cnt > 10 and cnt <= 15):color_class = 2
elif (cnt > 5 and cnt <= 10):color_class = 1
else: color_class = 0
except:
continue
# TODO: decide color classes
color = colors[color_class]
p['style'] = path_style +";fill:"+ color
Based on the number of pokemons, we want to assign the county to a color class. For example, if number > 50, use color1, if 40 < number <= 50, use color 2, etc.
Remember that we saved the svg in the soup object. Now that we have changed the svg to fill with colors, we can just write it out as a new file.
In [19]:
with open ('svg_colored.svg', 'w') as g:
g.write(soup.prettify())
Open the new svg in your browser. You'll notice that only a few counties are colored: this is partly because we're only using a subset of the original data. The complete data has 296021 rows and looking up the FIPS will take too much time in class. If interested, you can download the full data and make a completed map.